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DETAILED ACTION 
Claim Rejections - 35 USC § 103 

1. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

2. Claims 23-31, 33-45, 47-91 and 93-100 are rejected under 35 U.S.C. 103(a) as 
being unpatentable over LADD (U.S. Patent 6,269,336) in view of "Multimedia Content 
Description in the InfoPyramid" by Li et al. 

As to claim 23, LADD teaches a conversational browser (voice browser), 
comprising: means for interpreting a user command (voice input) and for generating a 
request (content request) to access a CML file (markup language document), wherein 
CML comprises meta-information implementing a conversational dialog for interaction 
with the user in a plurality of user interface modalities (via the network access apparatus 
of the system allows the user to access (i.e., view and/or hear) the information retrieved 
from the information source wherein the information is in the form of machine readable 
data, human readable data, audio or speech communications, textual information, 
graphical or image data, etc (col. 3, lines 40-46) (col. 3, lines 40-46; col. 4, lines 36-43; 
col. 4, lines 52-58); and a CML processor (parsing unit) for parsing and interpreting a 
CML file to render the conversational dialog in one or more of the plurality of user 
interface modalities (col. 11, lines 25-49; col. 11, line 66 - col. 12, line 24; col. 3, lines 
40-46; col. 4, lines 36-43; col. 4, lines 52-58). It would be obvious to one skilled in the 
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art that browser means for interpreting is a voice interface for receiving the voice 
commands. However, LADD does not explicitly mention that the CML itself comprises 
both GUI modality and speech modality that implements a conversational dialog to 
enable interaction with a user. 

Li teaches a conversational markup language file or application that is accessible 
to a browser wherein the CML enables interaction with the user in a plurality of user 
interface modalities including a GUI modality and speech modality (via multimedia 
content is usually not in a single media format, or modality) (pg. 3789, InfoPyramids, 
Multi-modal) (see also wherein a news story is represented at different resolutions and 
is comprised of different modalities such that a user can query the database for the 
news story) (pg. 3792, 5.4 TV News Application). It would be obvious to one of ordinary 
skill in the art that the document of LADD is a news story of Li in order to retrieve and/or 
access the requested data / story. Therefore, it would be obvious to one of ordinary skill 
in the art to conibine the teachings of LADD with the teachings of Li in order to facilitate 
the handling of multimedia the search, retrieval, manipulation, and transmission of 
multimedia data by providing a hierarchy for content descriptors (abstract; pg. 3789, 
InfoPyramids). 

As to claims 24 and 25, LADD teaches a conversational browser (voice browser) 
of a computing device that provides a conversational user interface to render a 
conversational dialog (col. 11, lines 25-49). LADD also teaches that variations and 
modifications may be practiced on the system (col. 2, lines 10-14). However, LADD 
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does not teach that the browser executes on top of an operating platform. Official 
Notice is taken in that it is well known in the art that a browser executes on a virtual 
machine to send and handle remote request and therefore would be obvious in view of 
LADD in order to send and handle voice requests. 

As to claims 26-29, LADD teaches a dialog manager (VRU server / interpreter 
unit) for managing and controlling the conversational dialog wherein the dialog manager 
allocates conversational engines (test to speech unit / automatic speech recognition 
unit) for rendering the conversational dialog by meta-information of a CML file (col. 9, 
lines 1-53; col. 13, lines 41-60). 

As to claims 30, 31 , 33 and 34. LADD teaches the user input command (voice 
input) can be input in the one or more user interface modalities (col. 11, lines 31-35; col. 
3, lines 40-46; col. 4, lines 36-43; col. 4. lines 52-58; col. 2, lines 48-66), the CML is 
implemented in a declarative format encapsulating multi-modal dialog (col. 16, lines 5- 
56). Official Notice is taken in that it is well known in the art that XML is a markup 
language and therefore would be obvious that the markup language of LADD is XML. 

As to claims 35-38, LADD teaches the input commands to the browser are voice 
commands (col. 1 1 , lines 26-36). Therefore, it would be obvious to one skilled in the art 
that the since the commands are voice commands that navigates to a web page that the 
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browser implements a "wliat you hear is what you can say", a "say what you heard", a 
"say what you will hear", and a "mixed initiative dialog formats. 

As to claim 80, LADD teaches a method for accessing information, comprising 
the steps of: processing an input command (voice input) with at least one of a plurality 
of conversational engines (network fetcher); generating a request (content request) 
based on the processed input command (voice input) to access a CML file (markup 
language document) from a content server (mark up language server), the CML file 
comprising meta-information to implement a conversational dialog in a plurality of user 
interface modalities (via the network access apparatus of the system allows the user to 
access (i.e., view and/or hear) the information retrieved from the information source 
wherein the information is in the form of machine readable data, human readable data, 
audio or speech communications, textual information, graphical or image data, etc (col. 
3, lines 40-46) (col. 3, lines 40-46; col. 4, lines 36-43; col. 4, lines 52-58); transmitting 
the request (content request) and accessing the requested CML file from a content 
server using a standard networking protocol; and processing the meta-information 
comprising the CML file to render the conversational dialog in one or more of a plurality 
of user interface modalities (via parsing the information and executing the file using the 
browser to display and/or play sound) (col. 11, lines 25-49; col. 11, lines 66 - col. 12, 
line 25; col. 14, lines 3-17; col. 2, lines 20-39; col. 2, line 59 - col. 3, line 5). However, 
LADD does not explicitly mention that the CML itself comprises both GUI modality and 
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speech modality tliat implements a conversational dialog to enable interaction with a 
user. 

Li teaches a conversational markup language file or application that is accessible 
to a browser wherein the CML enables interaction with the user in a plurality of user 
interface modalities including a GUI modality and speech modality (via multimedia 
content is usually not in a single media format, or modality) (pg. 3789, InfoPyramids, 
Multi-modal) (see also wherein a news story is represented at different resolutions and 
is comprised of different modalities such that a user can query the database for the 
news story) (pg. 3792, 5.4 TV News Application). It would be obvious to one of ordinary 
skill in the art that the document of LADD is a news story of Li in order to retrieve and/or 
access the requested data / story. Therefore, it would be obvious to one of ordinary skill 
in the art to combine the teachings of LADD with the teachings of Li in order to facilitate 
the handling of multimedia the search, retrieval, manipulation, and transmission of 
multimedia data by providing a hierarchy for content descriptors (abstract; pg. 3789, 
InfoPyramids). 

As to claims 81 and 82, LADD teaches a conversational browser (voice browser) 
of a computing device executes the steps (col. 1 1 , lines 25-49). LADD also teaches 
that variations and modifications may be practiced on the system (col. 2, lines 10-14). 
However, LADD does not teach that the browser executes on top of an operating 
platform. Official Notice is taken in that it is well known in the art that a browser 
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executes on a virtual machine to send and handle remote request and therefore would 
be obvious in view of LADD in order to send and handle voice requests. 

As to claims 84 and 85, LADD teaches customizing the CML file (markup 
language document) based on the conversational capabilities of the browser (the 
structure of the language can be designed specifically for voice applications); and 
registering the capabilities with the content server (via storing the files on markup 
language servers) (col. 15, line 60 -col. 16, line 21). 

As to claim 83, LADD teaches the steps are distributed using a conversational 
engine (test to speech unit / automatic speech recognition unit) and conversational 
arguments (request data / document attributes) (col. 1 1 , lines 25-49; col. 9, lines 1-53; 
col. 13, lines 41-60). 

As to claim 86-88, LADD teaches transcoding legacy content of the content 
server (information from the information sources) into CML based on predefined 
transcoding rules (via the parser unit) (col. 12, lines 15-24; col. 5, lines 8-11). 



As to claim 89, LADD teaches processing the meta-information comprises 
playing back an audio file or generating synthesized speech output (col. 4, lines 50-61). 
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As to claims 90, 91 and 93, LADD teaches the CML is implemented in a 
declarative format encapsulating multi-modal dialog (col. 16, lines 5-56). Official Notice 
is taken in that it is well known in the art that XML is a markup language and therefore 
would be obvious that the markup language of i-ADD is XML (see Li reference). 

As to claims 94-100, LADD teaches the CML (via markup language document) 
comprises one of (1) a top level element that groups other CML elements; (2) an 
element that specifies output to be spoken to the user (3) a menu element for 
encapsulating a menu that presents the user with a list of choices wherein each choice 
is associated with a target address identifying a CML element to visit if the 
corresponding choice is selected; (4) a form element for encapsulating a form that 
allows the user to input at least one item of information and transmit the at least one 
item of information to a target address; and (5) a combination thereof (col. 16, lines 29 - 
col. 17, line 49). 

As to claim 39, LADD teaches a system for accessing information (information), 
comprising: a content server (mark up language server) comprising content pages 
(mark up language documents), wherein the content pages are implemented using a 
CML (mark up language) to describe a conversational dialog for interaction with a user 
in a plurality of user interface modalities (view and audio) including a GUI modality and 
speech modality (via the network access apparatus of the system allows the user to 
access (i.e., view and/or hear) the information retrieved from the information source 
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wherein the Infornnation is in the form of machine readable data* human readable data, 
audio or speech communications, textual information, graphical or image data, etc (col. 
3, lines 40-46) (col. 15, line 60 - col. 16, line 57; col. 3, lines 40-46; col. 4, lines 36-43; 
col. 4, lines 52-58); and a conversational browser (voice browser) for processing a CML 
page received from the content server to render its conversational dialog in one or more 
of the plurality of user interface modalities (col. 11, lines 25-49; col. 1 1 , line 66 - col. 12, 
line 24; col. 3, lines 40-46; col. 4, lines 36-43; col. 4, lines 52-58). However, LADD does 
not teach that the browser executes on top of an operating platform. Official Notice is 
taken in that it is well known in the art that a browser executes on a virtual machine to 
send and handle remote request and therefore would be obvious in view of LADD in 
order to send and handle voice requests. However, LADD does not explicitly mention 
that the CML itself comprises both GUI modality and speech modality that implements a 
conversational dialog to enable interaction with a user. 

Li teaches a conversational markup language file or application that is accessible 
to a browser wherein the CML enables interaction with the user in a plurality of user 
interface modalities including a GUI modality and speech modality (via multimedia 
content is usually not in a single media format, or modality) (pg. 3789, InfoPyramids, 
Multi-modal) (see also wherein a news story is represented at different resolutions and 
is comprised of different modalities such that a user can query the database for the 
news story) (pg. 3792, 5.4 TV News Application). It would be obvious to one of ordinary 
skill in the art that the document of LADD is a news story of Li in order to retrieve and/or 
access the requested data / story. Therefore, it would be obvious to one of ordinary skill 
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in the art to combine the teachings of LADD with the teachings of Li in order to facilitate 
the handling of multimedia the search, retrieval, manipulation, and transmission of 
multimedia data by providing a hierarchy for content descriptors (abstract; pg. 3789, 
InfoPyramids). 

As to claims 40-44, LADD teaches the system comprises an IVR system 
implemented in CML (system capable of handling a voice markup language document) 
(col. 11, lines 25-49; col. 14, lines 3-17) and accessibly over a packet-switched network 
using a standard network protocol (col. 2, lines 26-39). 

As to claims 45 and 47-51 , LADD teaches the CML is implemented in a 
declarative format encapsulating multi-modal and speech dialog (col. 16, lines 5-56; col. 
16, line 58 - col. 17, line 49). Official Notice is taken in that it is well known in the art 
that XML is a markup language and therefore would be obvious that the markup 
language of LADD is XML (see Li reference). 

As to claims 52-54, LADD teaches a conversational browser (voice browser) on a 
computing device communicating over a communications network (col. 11, lines 25-49). 
LADD also teaches that variations and modifications may be practiced on the system 
(col. 2, lines 10-14). However, LADD does not teach that the browser executes on top 
of an virtual machine. Official Notice is taken In that it is well known in the art that a 
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browser executes on a virtual machine to send and handle remote request and 
therefore would be obvious in view of UKDD in order to send and handle voice requests. 

As to claims 55 and 56, LADD teaches standard network protocols are utilized for 
accessing CML content pages from the content server (col. 5, lines 37-62; col. 2, lines 
26-39). 

As to claims 57-62, LADD teaches transcoding legacy content of the content 
server (information from the information sources) into CML based on predefined 
transcoding rules (via the parser unit) (col. 12, lines 15-24; col. 5, lines 8-11). 

As to claims 63-71, LADD teaches CML (via markup language document) 
comprises a plurality of capability-based frames, an active link, a link to conversational 
data files, a link to at least one distributed conversational engine, a link to an audio file 
for playback, a confirmation message tag, TTS markup, scripting language and 
imperative code, and a link to one of a plug-in or an applet for executing a 
conversational task (col. 16, line 29 - col. 17, line 49). 

As to claims 72-79, LADD teaches the CML (via markup language document) 
comprises one of (1) a top level element that groups other CML elements; (2) an 
element that specifies output to be spoken to the user (3) a menu element for 
encapsulating a menu that presents the user with a list of choices wherein each choice 
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is associated with a target address identifying a CML element to visit if the 
corresponding choice is selected; (4) a form element for encapsulating a form that 
allows the user to input at least one item of information and transmit the at least one 
item of information to a target address; and (5) a combination thereof (col. 16, lines 29 - 
col. 17, line 49). 

Response to Arguments 

3. Applicant's arguments filed April 12, 2007 have been fully considered but they 
are not persuasive. Applicant argues that the combination of LADD and LI do not 
disclose or suggest a conversational browser or method for processing a CML 
document and rendering its conversational dialog in one or more of a plurality of user 
interface modalities. Applicant further states that the teachings of Li is irrelevant in that 
it discloses nothing more than a content description language for multimedia that 
improves searching, indexing, and managing multimedia contents and is not related to 
the process of parsing and interpreting a CML file / application to render the 
conversational dialog in a plurality of user interface modalities. The examiner 
disagrees. LADD taught a network access apparatus that allows the user to access (i.e. 
view and/or hear) the information retrieved from the information source and provide the 
information to the user as machine readable data, human readable data, audio or 
speech communications, textual information, graphical or image data, etc. (col. 3, lines 
40-65). The network access apparatus includes a voice or web browser. Upon 
receiving an input or command from the user, the electronic network establishes a 
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connection to the information source and retrieves at least a portion of the information 
from the destination of the information source (col. 4, lines 36-52). The electronic 
network processes the information and then provides an output to the user based upon 
the retrieved information wherein the output can include a speech communication, 
textual information, and/or graphical information (col. 4, lines 50-61). The information 
source, includes a markup language server, that includes a database, scripts, and 
markup language documents or pages (col. 11, lines 12-19). A voice browser sends 
requests to the information source which sends back at least a portion of the requested 
information, that can be text content, markup language document or pages, non-text 
content, dialogs, audio sample data, recognition grammars, etc., wherein the voice 
browser then parses and interprets the information via the parse unit and the interpreter 
unit (col. 11, lines 36-49; col. 11, lines 66 -col. 12, line 24; col. 13, lines 41-65). 
Therefore, LADD teaches a conversational browser or method for parsing a document 
and rendering its dialog in one or more user interface modalities. However, LADD does 
not explicitly mention that the markup language documents are in a plurality of user 
interface modalities. Li teaches that information (multimedia information) written in a 
markup language (XML) are disclosed in a plurality of modalities when requested by a 
user, hence the markup language comprises meta-information implementing a dialog in 
a plurality of modalities. In particular, pg. 3792, details that the InfoPyramid can be 
represented in XML wherein the news story's content is multi-modal, and depending on 
the query, the right modality has to be exposed to the search query. Similarly, in reply 
to a query, different modalities and/or resolutions may be returned. LADD's teachings 
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uses a browser to access (search) for information from information sources. Li's 
teachings details that this information is formatted in different in a multi-modality 
language to enable the user to interact in a plurality of modalities. Therefore, the 
combination adequately teachings the limitations of the claims and can be properly 
combined to improve the searching, indexing, and managing of multimedia content 
wherein the content descriptors are parsed and interpreted by the parsing and 
interpreter of the voice browser of LADD and the rejection is maintained. 

Conclusion 

4. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any Inquiry concerning this communication or earlier communications from the 
examiner should be directed to Lewis A. Bullock, Jr. whose telephone number is (571) 
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272-3759. The examiner can normally be reached on Monday-Friday, 8:30 a.m. - 5:00 
p.m.. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Meng An can be reached on (571) 272-3756. The fax phone number for the 
organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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